In this work, we study parameter tuning towards the M^2 metric, the standardmetric for automatic grammar error correction (GEC) tasks. After implementingM^2 as a scorer in the Moses tuning framework, we investigate interactions ofdense and sparse features, different optimizers, and tuning strategies for theCoNLL-2014 shared task. We notice erratic behavior when optimizing sparsefeature weights with M^2 and offer partial solutions. We find that a bare-bonesphrase-based SMT setup with task-specific parameter-tuning outperforms allpreviously published results for the CoNLL-2014 test set by a large margin(46.37% M^2 over previously 41.75%, by an SMT system with neural features)while being trained on the same, publicly available data. Our newly introduceddense and sparse features widen that gap, and we improve the state-of-the-artto 49.49% M^2.
展开▼